智能论文笔记

Scene Text Recognition with Permuted Autoregressive Sequence Models

Darwin Bautista , Rowel Atienza

分类：计算机视觉 | 自然语言处理

2022-07-14

上下文感知的str方法通常使用内部自回旋（AR）语言模型（LM）。 AR模型的固有局限性动机是采用外部LM的两阶段方法。输入图像上外部LM的条件独立性可能导致其错误地纠正正确的预测，从而导致明显的低效率。我们的方法Parseq使用置换语言建模学习了具有共同权重的内部AR LMS集合。它统一了无上下文的非AR和上下文感知的AR推断，并使用双向上下文统一了迭代的精致。使用合成训练数据，Parseq实现了最新的（SOTA），从而获得了Str基准（精度为91.9％）和更具挑战性的数据集。在对实际数据进行培训时，它建立了新的SOTA结果（精度为96.0％）。 Parseq由于其简单，统一的结构和平行的令牌处理，对准确性与参数计数，拖放和延迟非常最佳。由于其广泛使用了注意力，它对在现实世界图像中常见的任意导向文本具有鲁棒性。代码，预处理的权重和数据可在以下网址提供：https：//github.com/baudm/parseq。

translated by 谷歌翻译

Smart Face Shield: A Sensor-Based Wearable Face Shield Utilizing Computer Vision Algorithms

Manuel Luis C. Delos Santos , Ronaldo S. Tinio , Darwin B. Diaz , Karlene Emily I. Tolosa

分类：计算机视觉

2022-12-18

The study aims the development of a wearable device to combat the onslaught of covid-19. Likewise, to enhance the regular face shield available in the market. Furthermore, to raise awareness of the health and safety protocols initiated by the government and its affiliates in the enforcement of social distancing with the integration of computer vision algorithms. The wearable device was composed of various hardware and software components such as a transparent polycarbonate face shield, microprocessor, sensors, camera, thin-film transistor on-screen display, jumper wires, power bank, and python programming language. The algorithm incorporated in the study was object detection under computer vision machine learning. The front camera with OpenCV technology determines the distance of a person in front of the user. Utilizing TensorFlow, the target object identifies and detects the image or live feed to get its bounding boxes. The focal length lens requires the determination of the distance from the camera to the target object. To get the focal length, multiply the pixel width by the known distance and divide it by the known width (Rosebrock, 2020). The deployment of unit testing ensures that the parameters are valid in terms of design and specifications.

translated by 谷歌翻译

A Frequency-Structure Approach for Link Stream Analysis

Esteban Bautista , Matthieu Latapy

分类：机器学习

2022-12-07

A link stream is a set of triplets $(t, u, v)$ indicating that $u$ and $v$ interacted at time $t$. Link streams model numerous datasets and their proper study is crucial in many applications. In practice, raw link streams are often aggregated or transformed into time series or graphs where decisions are made. Yet, it remains unclear how the dynamical and structural information of a raw link stream carries into the transformed object. This work shows that it is possible to shed light into this question by studying link streams via algebraically linear graph and signal operators, for which we introduce a novel linear matrix framework for the analysis of link streams. We show that, due to their linearity, most methods in signal processing can be easily adopted by our framework to analyze the time/frequency information of link streams. However, the availability of linear graph methods to analyze relational/structural information is limited. We address this limitation by developing (i) a new basis for graphs that allow us to decompose them into structures at different resolution levels; and (ii) filters for graphs that allow us to change their structural information in a controlled manner. By plugging-in these developments and their time-domain counterpart into our framework, we are able to (i) obtain a new basis for link streams that allow us to represent them in a frequency-structure domain; and (ii) show that many interesting transformations to link streams, like the aggregation of interactions or their embedding into a euclidean space, can be seen as simple filters in our frequency-structure domain.

translated by 谷歌翻译

A Whole-Body Controller Based on a Simplified Template for Rendering Impedances in Quadruped Manipulators

Mattia Risiglione , Victor Barasuol , Darwin G. Caldwell , Claudio Semini

分类：机器人

2022-08-01

在自动操纵，远程操作或物理人类机器人相互作用期间，四足动物的操纵器在与外部力量打交道时必须合规。本文提出了一个全身控制器，该控制器允许实施笛卡尔阻抗控制，以协调跟踪性能以及对机器人基础和操纵器组的理想合规性。控制器是通过使用二次编程（QP）的优化问题制定的，以对系统施加所需的行为，同时满足摩擦锥限制，单方面力量约束，关节和扭矩限制。提出的策略将平台的手臂和底座取代，从而实施了线性双质量弹簧阻尼器系统的行为，并允许独立调整其惯性，刚度和阻尼特性。使用配备了7-DOF操纵器组的90kg HYQ机器人通过广泛的模拟研究来验证控制架构。仿真结果表明，当在手臂的最终效用器上应用外力时，阻抗渲染性能。该论文介绍了完整姿势条件（地面上的所有腿）的结果，并且首次显示阻抗渲染如何受动态步态过程中接触条件的影响。

translated by 谷歌翻译

GAUDI: A Neural Architect for Immersive 3D Scene Generation

Miguel Angel Bautista , Pengsheng Guo , Samira Abnar , Walter Talbott , Alexander Toshev , Zhuoyuan Chen , Laurent Dinh , Shuangfei Zhai , Hanlin Goh , Daniel Ulbricht

分类：计算机视觉 | 机器学习

2022-07-27

我们介绍了Gaudi，Gaudi是一种生成模型，能够捕获可以从移动的相机中沉浸式的复杂和现实3D场景的分布。我们通过一种可扩展而强大的方法解决了这个具有挑战性的问题，我们首先优化了散布辐射场和相机姿势的潜在表示。然后，该潜在表示将学习一个生成模型，该模型可以使3D场景的无条件生成和条件生成。我们的模型概括了以前的作品，该作品通过删除可以在样本中共享相机姿势分布的假设来关注单个对象。我们表明，高迪（Gaudi）在多个数据集的无条件生成设置中获得了最先进的性能，并允许有条件地生成3D场景给定的调理变量，例如稀疏图像观测值或描述场景的文本。

translated by 谷歌翻译

Global and Local Features through Gaussian Mixture Models on Image Semantic Segmentation

Darwin Saire , Adín Ramírez Rivera

分类：计算机视觉

2022-07-19

语义细分任务的目的是在像素级别上进行密集分类。深层模型在解决这项任务方面表现出进展。但是，这些方法的剩余问题是空间精度的丧失，通常是在分段对象的边界上产生的。我们提出的模型通过为特征表示形式提供内部结构来解决此问题，同时提取支持前者的全局表示。为了适应内部结构，在训练过程中，我们预测数据中的高斯混合模型，该模型与跳过连接和解码阶段合并，有助于避免换动态偏见。此外，我们的结果表明，我们可以通过提供集群行为并将其组合来通过提供学习表征（全球和本地）来改善语义细分。最后，我们提出的结果证明了我们在城市景观和合成数据集方面的进步。

translated by 谷歌翻译

SURIMI: Supervised Radio Map Augmentation with Deep Learning and a Generative Adversarial Network for Fingerprint-based Indoor Positioning

Darwin Quezada-Gaibor , Joaquín Torres-Sospedra , Jari Nurmi , Yevgeni Koucheryavy , Joaquín Huerta

分类：机器学习

2022-07-13

基于机器学习的室内定位引起了学院和行业的越来越多的关注，因为可以从参考数据中提取有意义的信息。许多研究人员正在使用受监督，半监督和无监督的机器学习模型来减少定位错误并为最终用户提供可靠的解决方案。在本文中，我们通过结合卷积神经网络（CNN），长期记忆（LSTM）和生成对抗网络（GAN）来提出一种新的体系结构，以增加训练数据并提高位置准确性。在17个公共数据集中对受监督和无监督模型的建议组合进行了测试，从而对其性能进行了广泛的分析。结果，超过70％的定位误差已减少。

translated by 谷歌翻译

Empirical Study of Quality Image Assessment for Synthesis of Fetal Head Ultrasound Imaging with DCGANs

Thea Bautista , Jacqueline Matthew , Hamideh Kerdegari , Laura Peralta Pereira , Miguel Xochicale

分类：计算机视觉 | 机器学习

2022-06-01

在这项工作中，我们介绍了DCGAN的实证研究，包括超参数启发式方法和图像质量评估，以解决研究数据集的稀缺性，以研究胎儿头超声。我们提出了实验，以显示不同图像分辨率，时期，数据集大小输入和对四个指标质量图像评估的学习速率的影响：互信息（MI），fr \'Echet Inception Inteption距离（FID），峰值信号到峰值信号-noise比率（PSNR）和局部二进制模式矢量（LBPV）。结果表明，FID和LBPV与临床图像质量评分具有更强的关系。复制此工作的资源可在\ url {https://github.com/budai4medtech/miua2022}中获得。

translated by 谷歌翻译

UniMorph 4.0: Universal Morphology

Khuyagbaatar Batsuren , Omer Goldman , Salam Khalifa , Nizar Habash , Witold Kieraś , Gábor Bella , Brian Leonard , Garrett Nicolai , Kyle Gorman , Yustinus Ghanggo Ate

分类：自然语言处理

2022-05-07

通用形态（UNIMORPH）项目是一项合作的努力，可为数百种世界语言实例化覆盖范围的标准化形态拐角。该项目包括两个主要的推力：一种无独立的特征架构，用于丰富的形态注释，并以各种语言意识到该模式的各种语言的带注释数据的类型级别资源。本文介绍了过去几年对几个方面的扩张和改进（自McCarthy等人（2020年）以来）。众多语言学家的合作努力增加了67种新语言，其中包括30种濒危语言。我们已经对提取管道进行了一些改进，以解决一些问题，例如缺少性别和马克龙信息。我们还修改了模式，使用了形态学现象所需的层次结构，例如多肢体协议和案例堆叠，同时添加了一些缺失的形态特征，以使模式更具包容性。鉴于上一个UniMorph版本，我们还通过16种语言的词素分割增强了数据库。最后，这个新版本通过通过代表来自metphynet的派生过程的实例丰富数据和注释模式来推动将衍生物形态纳入UniMorph中。

translated by 谷歌翻译

Why-So-Deep: Towards Boosting Previously Trained Models for Visual Place Recognition

M. Usman Maqbool Bhutta , Yuxiang Sun , Darwin Lau , Ming Liu

分类：计算机视觉 | 机器人

2022-01-10

基于深度学习的图像检索技术，用于环路闭合检测呈现令人满意的性能。然而，在不同地理区域的先前经过训练的模型，实现高级别性能仍然挑战。本文讨论了在新环境中同时定位和映射（SLAM）系统的部署问题。普通基线方法使用其他信息，例如GPS，顺序关键帧跟踪，并重新培训整个环境，以增强召回率。我们提出了一种基于先前训练的模型来改善图像检索的新方法。我们提出了一种智能方法MAQBool，用于放大预先训练的模型的功率，以便更好的图像召回及其在实时多轴SLAM系统中的应用。与最先进的方法的高描述符尺寸（4096-D）相比，我们在低描述符维度（512-D）上实现了可比的图像检索结果。我们使用空间信息来提高预先训练模型的图像检索中的召回速率。

translated by 谷歌翻译